Capturing Interactive Data Transformation Operations Using Provenance Workflows
نویسندگان
چکیده
The ready availability of data is leading to the increased opportunity of their re-use for new applications and for analyses. Most of these data are not necessarily in the format users want, are usually heterogeneous, and highly dynamic, and this necessitates data transformation efforts to re-purpose them. Interactive data transformation (IDT) tools are becoming easily available to lower these barriers to data transformation efforts. This paper describes a principled way to capture data lineage of interactive data transformation processes. We provide a formal model of IDT, its mapping to a provenance representation, and its implementation and validation on Google Refine. Provision of the data transformation process sequences allows assessment of data quality and ensures portability between IDT and other data transformation platforms. The proposed model showed a high level of coverage against a set of requirements used for evaluating systems that provide provenance management solutions.
منابع مشابه
A Provenance-Based Fault Tolerance Mechanism for Scientific Workflows
Capturing provenance information in scientific workflows is not only useful for determining data-dependencies, but also for a wide range of queries including fault tolerance and usage statistics. As collaborative scientific workflow environments provide users with reusable shared workflows, collection and usage of provenance data in a generic way that could serve multiple data and computational...
متن کاملUsing Provenance to Streamline Data Exploration through Visualization
Scientists are faced with increasingly larger volumes of data to analyze. To analyze and validate various hypotheses, they need to create insightful visual representations of both observed data and simulated processes. Often, insight comes from comparing multiple visualizations. But data exploration through visualization requires scientists to assemble complex workflows—pipelines consisting of ...
متن کاملOn the Use of Abstract Workflows to Capture Scientific Process Provenance
Capturing provenance about artifacts produced by distributed scientific processes is a challenging task. For example, one approach to facilitate the execution of a scientific process in distributed environments is to break down the process into components and to create workflow specifications to orchestrate the execution of these components. However, capturing provenance in such an environment,...
متن کاملLogical Provenance in Data-Oriented Workflows∗ (Long Version)
We consider the problem of defining, generating, and tracing provenance in dataoriented workflows, in which input data sets are processed by a graph of transformations to produce output results. We first give a new general definition of provenance for general transformations, introducing the notions of correctness, precision, and minimality. We then determine when properties such as correctness...
متن کاملDistinguishing Provenance Equivalence of Earth Science Data
Reproducibility of scientific research relies on accurate and precise citation of data and the provenance of that data. Earth science data are often the result of applying complex data transformation and analysis workflows to vast quantities of data. Provenance information of data processing is used for a variety of purposes, including understanding the process and auditing as well as reproduci...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012